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Abstract 

A model, called the linear transform network (LTN), is proposed to analyze the compression and 
estimation of correlated signals transmitted over directed acyclic graphs (DAGs). An LTN is a DAG network 
with multiple source and receiver nodes. Source nodes transmit subspace projections of random correlated 
signals by applying reduced-dimension Unear transforms. The subspace projections are linearly processed by 
multiple relays and routed to intended receivers. Each receiver applies a linear estimator to approximate a 
subset of the sources with minimum mean squared error (MSE) distortion. The model is extended to include 
noisy networks with power constraints on transmitters. A key task is to compute all local compression 
matrices and linear estimators in the network to minimize end-to-end distortion. The non-convex problem 
is solved iteratively within an optimization framework using constrained quadratic programs (QPs). The 
proposed algorithm recovers as special cases the regular and distributed Karhunen-Loeve transforms (KLTs). 
Cut-set lower bounds on the distortion region of multi-source, multi-receiver networks are given for linear 
coding based on convex relaxations. Cut-set lower bounds are also given for any coding strategy based on 
information theory. The distortion region and compression-estimation tradeoffs are illustrated for different 
communication demands (e.g. multiple unicast), and graph structures. 
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Index Terms 

Karhunen-Loeve transform (KLT), linear transform network (LTN), quadratic program (QP), cut-set 
bound. 

I. Introduction 

THE compression and estimation of an observed signal via subspace projections is both a classical 
and current topic in signal processing and communication. While random subspace projections have 
received considerable attention in the compressed sensing literature [l], subspace projections optimized 
for minimal distortion are important for many applications. The Karhunen-Loeve transform (KLT) and 
its empirical form Principal Components Analysis (PCA), are widely studied in computer vision, biology, 
signal processing, and information theory. Reduced dimensionality representations are useful for source 
coding, noise filtering, compression, clustering, and data mining. Specific examples include eigenfaces for 
face recognition, orthogonal decomposition in transform coding, and sparse PCA for gene analysis |[2l-fll. 

In contemporary applications such as wireless sensor networks (WSNs) and distributed databases, data is 
available and collected in different locations. In a WSN, sensors are usually constrained by limited power and 
bandwidth resources. This has motivated existing approaches to take into account correlations across high- 
dimensional sensor data to reduce transmission requirements (see e.g. lISl- lfTTI '). Rather than transmitting raw 
sensor data to a fusion center to approximate a global signal, sensor nodes carry out local data dimensionality 
reduction to increase bandwidth and energy efficiency. 

In the present paper, we propose a linear transform network (LTN) model to analyze dimensionality 
reduction for compression-estimation of correlated signals in multi-hop networks. In a centralized setting, 
given a random source signal x with zero-mean and covariance matrix "Ex, applying the KLT to x yields 
uncorrected components in the eigenvector basis of Ylx- The optimal linear least squares A;*^-order approx- 
imation of the source is given by the k components corresponding to the k largest eigenvalues of "Ex- In 
a network setting, multiple correlated signals are observed by different source nodes. The source nodes 
transmit low-dimensional subspace projections (approximations of the source) to intended receivers via a 
relay network. The compression-estimation problem is to optimize the subspace projections computed by all 
nodes in order to minimize the end-to-end distortion at receiver nodes. 

In our model, receivers estimate random vectors based on "one-shot" linear analog-amplitude multisensor 
observations. The restriction to "one-shot", zero-delay encoding of each vector of source observations 
separately is interesting due to severe complexity limitations in many applications (e.g. sensor networks). 
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Linear coding depends on first-order and second-order statistics and is robust to uncertainty in the precise 
probabilistic distribution of the sources. Under the assumption of ideal channels between nodes, our task 
is to optimize signal subspaces given limited bandwidth in terms of the number of real-valued messages 
communicated. Our results extend previous work on distributed estimation in this case Q-IH. For the case 
of dimensionality-reduction with noisy channel communication (see e.g. [6|), the task is to optimize signal 
subspaces subject to channel noise and power constraints. 

For noisy networks, the general communication problem is often referred to as the joint source-channel- 
network coding problem in the information-theoretic literature and is a famously open problem. Beyond the 
zero-delay, linear dimensionality-reduction considered here, end-to-end performance in networks could be 
improved by (i), non-linear strategies and (ii), allowing a longer coding horizon. Partial progress includes 
non-linear low-delay mappings for only simple network scenarios |[T2l - lfT4l . For the case of an infinite 
coding horizon, separation theorems for decomposing the joint communication problem have been analyzed 



by 1151-1171. 
A. Related Work 

Directly related to our work in networks is the distributed KLT problem. Distributed linear transforms 
were introduced by Gastpar et al. for the compression of jointly Gaussian sources using iterative methods ||5l 
[[TSl . Simultaneous work by Zhang et al. for multi-sensor data fusion also resulted in iterative procedures HI. 
An alternate proof based on innovations for second order random variables with arbitrary distributions was 
given by |[T9l . The problem was extended for non-Gaussian sources, including channel fading and noise 
effects to model the non-ideal link from sensors to decoder by Schizas et al. |f6l|. Roy and Vetterli provide 
an asymptotic distortion analysis of the distributed KLT, in the case when the dimension of the source and 
observation vectors approaches infinity 11201 . Finally, Xiao et al. analyze linear transforms for distributed 
coherent estimation |j7l. 

Much of the estimation-theoretic literature deals with single-hop networks; each sensor relays information 
directly to a fusion center. In multi-hop networks, linear operations are performed by successive relays to 
aggregate, compress, and redistribute correlated signals. The LTN model relates to recent work on routing and 
network coding (Ahlswede et al. |21|). In pure routing solutions, intermediate nodes either forward or drop 
packets. The corresponding analogy in the LTN model is to constrain transforms to be essentially identity 
transforms. However, network coding (over finite fields) has shown that mixing of data at intermediate nodes 
achieves higher rates in the multicast setting (see |22 | regarding the sufficiency of linear codes and |[23l for 
multicast code construction). Similarly in the LTN model, linear combining of subspace projections (over 
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the real field) at intermediate nodes improves decoding performance. Lastly, the max-flow min-cut theorem 
of Ford-Fulkerson |[24l provides the basis for cut-set lower bounds in networks. 

The LTN model is partially related to the formulation of Koetter and Kschischang |[25l modeling infor- 
mation transmission as the injection of a basis for a vector space into the network, and subspace codes ll26l . 
If arbitrary data exchange is permitted between network nodes, the compression-estimation problem is 
related to estimation in graphical models (e.g. decomposable PCA |[27l . and tree-based transforms (tree- 
KLT) |[28l ). Other related work involving signal projections in networks includes joint source-channel 
communication in sensor networks ||29]| . random projections in a gossip framework |[30]| . and distributed 
compressed sensing li3Tl . 

B. Summary of Main Results 

We cast the network compression-estimation problem as a statistical signal processing and constrained 
optimization problem. For most networks, the optimization is non-convex. Therefore, our main results are 
divided into two categories: (i) Iterative solutions for linear transform coding over acyclic networks; (ii) 
Cut-set bounds based on convex relaxations and cut-set bounds based on information theory. 

• Section |lll] reviews linear signal processing in networks. Section |IV] outlines an iterative optimization 
for compression-estimation matrices in ideal networks under a local convergence criterion. 

• Section |V] analyzes an iterative optimization method involving constrained quadratic programs for noisy 
networks with power allocation over subspaces. 

• Section |Vl] introduces cut-set lower bounds to benchmark the minimum mean square error (MSE) for 
linear coding based on convex relaxations such as a semi-definite program (SDP) relaxation. 

• Section IVI-FI describes cut-set lower bounds for any coding strategy in networks based on information- 
theoretic principles of source-channel separation. The lower bounds are plotted for a distributed noisy 
network. 

• Sections ITVWII provide examples illustrating the tradeoffs between compression and estimation; upper 
and lower bounds are illustrated for an aggregation (tree) network, butterfly network, and distributed 
noisy network. 

C. Notation 

Boldface upper case letters denote matrices, boldface lower case letters denote column vectors, and 
calligraphic upper case letters denote sets. The £^-norm of a vector x G M" is defined as ||x||2 = a/X^ILi 
The weighted £^-norm = ||Wa;||2 where W is a positive semi-definite matrix (written W ^ 0). 
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(a) Linear Transform Network (b) Signal Flow on Graph 



Fig. 1. (a) Linear Transform Network: An LTN model with source nodes {vi,V2} and receivers {v5,ve}. Source nodes observe 
vector signals {xi,X2}- All encoding nodes linearly process received signals using a transform L^j. Receivers and V(, compute 
LLSE estimates fs and re of desired signals rs and rs. (b) Signal Flow Graph: Linear processing of source signals {xi,X2} results 
in signals transmitted along edges of the graph. 

Let (•)^, (•)~^, and tr(-) denote matrix transpose, inverse, and trace respectively. Let A B denote the 
Kronecker matrix product of two matrices. The matrix 1^ denotes the i x i identity. For £ > k, the notation 
Tk:i = TfcTfc+i ■ --Ti denotes the product of {£ — k + 1) matrices. A matrix X G M"^x" is written in vector 
form vec(X) G M™" by stacking its columns; i.e. vec(X) = [xi;x2; ■ ■ ■ where xj is the j-th column 
of X. For random vectors, E[-] denotes the expectation, and 12x = E[xx'^] denotes the covariance matrix of 
the zero-mean random vector x. 

II. Problem Statement 

Fig. [T] serves as an extended example of an LTN graph. The network is comprised of two sources, two 
relays, and two receiver nodes. 

Definition 1 (Relay Network): Consider a relay network modeled by a directed acyclic graph (DAG) G = 
{V,£) and a set of weights C. The set V = {^1,^2, • • • :'i^\v\} is the vertex/node set, £ C {1, . . . , |V|} x 
{!,..., |V|} is the edge set, and C = {cij G Z+ : G £} is the set of weights. Each edge G £ 

represents a communication link with integer bandwidth Cij from node Vi to vj. The in-degree and out-degree 
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(1) 

(2) 

,vq}. Integer bandwidths Cij for each 



of a node Vi are computed as 

= ^ Cgi, 

q:{q,i)e£ 
l:(i,l)e£ 

As an example, the graph in Fig. [T] consists of nodes V = {vi,V2, 
communication Unk (i, j) are marked. 

Definition 2 (Source and Receiver Nodes): Given a relay network G = (V, £), the set of source nodes S C 
V is defined as 5 = {vj G V | = 0}. We assume a labeling of nodes in V so that S = {vi,V2, • • • , v\s\}, i.e. 
the first |5| nodes are source nodes. The set of receiver nodes T C V is defined as T = {vi eV\dt = 0}|] 
Let K = \V\- \T\. We assume a labeUng of nodes in V so that T = {uk+i, ■ ■ ■ , ^^|vj}' i-^- the last \T\ 
nodes are receiver nodes. 
In Fig. m 5 = {vi,V2} and T = {v5,vq}. 



A. Source Model 

Definition 3 (Basic Source Model): Given a relay network G = (V, £) with source/receiver nodes (S, T), 

\s\ \s\ 
the source nodes S = {vi}\2i observe random signals X = {Xi}\2i- random vectors Xi G M"' are 

assumed zero-mean with covariance Sjj, and cross-covariances Sjj G _ Let n = n^. The distributed 

network sources may be grouped into an n-dimensional random vector x = [xi;x2; ■ ■ ■ with known 

second-order statistics G M"^", 

Sii 5]i2 • • • 



1511 



More generally, each source node Vi G S emits independent and identically distributed (i.i.d.) source vectors 
{a;j[i]}t>o for t a discrete time index; however, in the analysis of zero-delay linear coding, we do not write 
the time indices explicitly. 

Remark 1: A common linear signal-plus-noise model for sensor networks is of the form Xi = HjX + nj; 
however, neither a linear source model nor the specific distribution of Xj is assumed here. A priori knowledge 
of second-order statistics may be obtained during a training phase via sample estimation. 



'For networks of interest in this paper, an arbitrary DAG G may be augmented with auxiliary nodes to ensure that source nodes 
have in-degree = and receiver nodes have out-degree — 0. 
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In Fig. [TJ two source nodes S = {^1,^2} observe the corresponding random signals in X = {xi,X2}- 
B. Communication Model 

Definition 4 (Communication Model): Given a relay network G = (V,<S) with weight-set C, each edge 
(i, j) G 8 represents a communication link of bandwidth from Vi to Vj. The bandwidth is the dimension 
of the vector channel. We denote signals exiting u j G V along edge G £ by Xij G MP'^ and signals 

entering node Vj along edge G by yij G M'^*^. If communication is noiseless, yij = Xij. For all relay 
nodes and receiver nodes, we further define yj G M^^ to be the concatenation of all signals yij incident to 
node Vj along edges G £. 

A noisy communication link (i, j) G is modeled as: y^ = xij + Zij. The channel noise Zij G M^'J is 
a Gaussian random vector with zero-mean and covariance Xlz^^ . The channel input is power constrained so 
that ii^[||xjj II2] < Pij- The power constraints for a network are given by set V = {Pij G : G £}. 
The signal-to-noise ratio (SNR) along a link is 



SNRij 



E 


'11 ||2" 


E 


"llz'-ll^" 

II2 



(4) 



Fig. dlb) illustrates the signal flow of an LTN graph. 



C. Linear Encoding over Graph G 

Source and relay nodes encode random vector signals by applying reduced-dimension linear transforms. 

Definition 5 (Linear Encoding): Given a relay network G = (V,<S), weight-set C, source/receiver nodes 
{S,T), sources X, and the communication model of Definition |4l the linear encoding matrices for G are 
denoted by set Cq = {Ljj : (i,j) G £}. Each Ljj represents the linear transform applied by node vi in 
communication with node Vj. For vi G S, transform Ljj is of size cij x rij and represents the encoding 
Xij = 'LijXi- For a relay vi, transform Ljj is of size Cij x , and Xij = I^ijyi- The compression ratio along 
edge G <5 is 

^ if G 5, (5a) 

m 

^ if t;, G V \ cS. (5b) 

In Fig. [U the linear encoding matrices for source node vi and V2 are {L15, L13} and {L26, L23} respectively. 
The linear encoding matrices for the relays are L34, L45, L46. The output signals of source node vi are 



a,; 
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Xis = Lis^i and X13 = L13X1. Similarly, the output signal of relay ^3 is 

2^34 = L34y3 = L34 



2/13 
y23 



(6) 



D. Linear Estimation over G 



Definition 6 (Linear Estimation): Given relay network G = (V,<f), weight-set C, source/receiver nodes 
{S,T), sources X, and the communication model of Def. IH the set of linear decoding matrices is denoted 
Bg = {Bi}i:v,eT- Each receiver Vi ^ T estimates a (zero-mean) random vector rj e M'"* which is correlated 
with the sources in X. We assume that the second-order statistics ^r.x are known. Receiver Vi £ T 
applies a linear estimator given by matrix Bj G R'"*^'^' to estimate given its observations and computes 
fi = Bjyi- The linear least squares estimate (LLSE) of is denoted by rj. 

In Fig. [H receiver reconstructs while receiver vq reconstructs r^. The LLSE signals and Tq are 
computed as 



rs = 352/5 = B5 



yi5 
y45 

y26 
y46 



(7) 



(8) 



Definition 7 (Distortion Metric): Let x and y be two real vectors of the same dimension. The MSE 
distortion metric is defined as 



dmse{x,y) = \\x-y\ 



2 • 



(9) 



E. Compression-Estimation in Networks 

Definition 8 (Linear Transform Network M): An LTN model A/" is a communication network modeled by 
DAG G = (V, £), weight-set C, source/receiver nodes (5, T), sources X, sets Cq, and Bg from Definitions [T]- 
m Second-order source statistics are given by (Definition The operational meaning of compression- 
estimation matrices in Cg and Bg is in terms of signal flows on G (Definition |4l). The desired reconstruction 
vectors {j'j}j:^^g7- have known second-order statistics Xl^, and ^nx- The set {fi}i:v^eT denotes the LLSE 
estimates formed at receivers (Definition [6ll. For noisy networks, noise variables along link G £ have 
known covariances Sz^^ . Power constraints are given by set V in Definition |4l 

Given an LTN graph M, the task is to design a network transform code: the compression-estimation matrices 
in Cg and Bg to minimize the end-to-end weighted MSE distortion. Let positive weights {wi\i:v^£T represent 
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the relative importance of reconstructing a signal at receiver Vi G T. Using indexing term k = \V\ — \T\ 
for receiver nodes, we concatenate vectors as r = [rK+i;rK+2; ••• ;^|v|] ^^id LLSE estimates fj as 
f = [rK+i;rK+2; • • • ;^|V|]- The average weighted MSE written via a weighted ^^-norm is 



Dmse,w — E 



E\^\r-rf 



(10) 



iw 

where W contains diagonal blocks Wj = y^I. 

Remark 2: The distortion Dmse,w is a function of the compression matrices in Cq and the estimation 
matrices in Bg- In most network topologies, the weighted MSE distortion is non-convex over the set of 
feasible matrices. Even in the particular case of distributed compression lH, currently the optimal linear 
transforms are not solvable in closed form. 

III. Linear Signal Processing in Networks 

The linear processing and filtering of source signals by an LTN graph M is modeled compactly as a linear 
system with inputs, outputs, and memory elements. At each time step, LTN nodes transmit random signals 
through edges/channels of the graph. 

A. Linear System 

Consider edge G as a memory element storing random vector yij. Let c = (X](t '^u) ^^^^ 

d = {J2i-v eT^T)- The network is modeled as a linear system with the following signals: (i) input sources 
{^i}i:vi€S concatenated as global source vector x G M"; (ii) input noise variables {2jj}(jj)G£: concatenated as 
global noise vector z G ffi^; (iii) memory elements {yij}(i.j)e£ concatenated as global state vector ij,[t] G M.'^ 
at time t; (iv) output vectors {yi}i:v,eT concatenated as y G M*^. 

1) State-space Equations: The linear systertu is described by the following state-space equations for 
i -.Vi gT, 

fi[t + l]=Ffi[t] + Ex[t]+Ez[t], (11) 
yi[t] = Cin[t] + Bix[t] + i)iZ[t]. (12) 

The matrix F G M^^'^ is the state-evolution matrix common to all receivers, E G M^^" is the source-network 
connectivity matrix, and E G M'^^^ is the noise-to-network connectivity matrix. The matrices Cj G ffi"'' 



"When discussing zero-delay linear coding, the time indices on vectors x, z, and yi are omitted for greater clarity of presentation. 
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Dj G M'^' and Dj G M'^' represent how each receiver's output is related to the state, source, and noise 
vectors respectively. For networks considered in this paper, Dj = and Dj = 0. 

2) Linear Transfer Function: A standard result in linear system theory yields the transfer function 
(assuming a unity indeterminate delay operator) for each receiver Vi G T, 

= C,(I-F)-^(Ex + E2), (13) 

= G,x + GiZ, (14) 

where Gj = Cj (I — F)~^ E and Gj = Cj (I — F)^^ E. For acyclic graphs, F is a nilpotent matrix and 
(I - F)"^ = I + ^l^^ F'-i for finite inte ger 7- Using indexing term k, the observation vectors collected by 
receivers are concatenated as y = [y^+i; 2/k+2; • • • ; 2/|v|]- Let 



[Gk+i; Gk+2; . . . ; G 



VI > 



(15) 



and let T be defined similarly with respect to matrices Gj. Then the complete linear transfer function of the 
network M is y = Tx + Tz. Analog processing of signals without error control implies noise propagation; 
the additive noise z is also linearly filtered by the network via T. 

Example 1: Fig. |2] is the LTN graph of a noisy relay network. Let state /x = [yi2'-, Vis', y23], z = 
[zi2; Zrs; Z23], and output y^ = [yis; 2/23]- The linear system representation is given as follows, 



By evaluating Eqn. ([141 ). 



ix[t + 1] 



2/3 W 














L12 













Ll3 


L23 
















I 

























I 







Xi[t]+IcZ[t], 



Ll3 




10 


Xi[t] + 




L23L12 




L23 I 



zt]. 



Dropping the time indices and writing x = Xi in addition to y = ys, the linear transfer function of the noisy 
relay network is of the following form: y = Tx + Tz. 



B. Layered Networks 

Definition 9 (Layered DAG Network): A layering of a DAG G = {V,£) is a partition of V into disjoint 
subsets Vi, V2, . . . , Vp+i such that if directed edge {u,v) G £, where u G Vj and v G Vk, then j > k. A 
DAG layering (non-unique) is polynomial-time computable ll32l . 
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Fig. 2. The LTN graph of a noisy relay network with 5 = {vi} and T — {vs}- The linear processing of the network is modeled 
as a linear system with input xi and output 1/3 — [yn; 3/23]- 



Given a layered partition {V^j^^l of an LTN graph, source nodes f i G 5 with in-degree d~ = may be 
placed in partition Vp+i. Similarly, receivers Vi £ T with out-degree df = may be placed in partition Vi- 
The transfer function T in Eqn. (031) may be factored into a product of matrices, 



Tl;, 



T1T2 • • • T, 



(16) 



where for 1 < ^ < p is the linear transformation of signals between nodes in partition V^+i and 
(note the reverse ordering of the Ti with respect to the partitions Ve)- If an edge exists between nodes in 
non-consecutive partitions, an identity transform is inserted to replicate signals between multiple layers. Due 
to the linearity of transforms, for any layered partition {V^j^^J^ of V, the layered transforms {T^}^^-,^ can 
be constructed. The {T^}^^^ are structured matrices comprised of sub-blocks Ljj, identity matrices, and/or 
zero matrices. The block structure is determined by the network topology. 

Example 2: For the multiple unicast network of Fig. [T] a valid layered partition of V is Vi = {v^,vq}, 
V2 = {^4}, V3 = {^3}, and V4 = {vi,V2]. Let x = [xi; X2], y = [2/5; Ve] = [2/15; 2/45; 2/46; 2/26], and 
let L34 be partitioned as L34 = [L34 L34]. According to the layering, the transfer matrix T is factored in 
product form T = T1T2T3, 



I 











L45 








L46 











I 



I 














L34 


T " 














I 



Ll5 





Ll3 








L23 





L26 



Example 3: Consider the setting of Example [T] for the relay network shown in Fig. |2] A valid layered 
partition of V is Vi = {v^}, V2 = {1^2}, V3 = {vi}. According to the layering, the transfer matrix T may 
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be written in product form T = T1T2, 



I 







Lis 





L23 




L12 



IV. Optimizing Compression-Estimation Matrices 

Our optimization method proceeds iteratively over network layers. To simplify the optimization, we first 
assume ideal channels (high-SNR communication) for which yij = Xij. Then the linear operation of the 
network M is y = Tx with 2 = 0. Linear transform coding is constrained according to bandwidth 
compression ratios Oij. 

A. MSE Distortion at Receivers 

According to the linear system equations, Eqns. (fTTI)-(fT4l). each receiver Vi £ T receives filtered source 
observations yi = GiX. Receiver Vi applies a linear estimator Bj to estimate signal r^. The MSE cost of 
estimation is 

1 2" 



D,=E 



BiGixl 



= tr(5]r,)-2tr(BiGiS^rJ+tr(BiGiS^GfBf). (17) 

Setting the matrix derivative with respect to Bj in Eqn. ([TV] ) to zero yields: — 25]r.xGf + 2BiGj5]a;Gf = 0. 
For a fixed transfer function Gj, the optimal LLSE matrix B°^* is 



(18) 



If Gj in Eqn. ([TSll is singular, the inverse may be replaced with a pseudo-inverse operation to compute B°^*. 

Let B denote a block diagonal global matrix containing individual decoding matrices {Bj}j:t,.g7- on the 
diagonal. For an LTN graph M with encoding transfer function T = Ti-p, we write the linear decoding 
operation of all receivers as r = By where y = Ti-,pX are the observations received. The weighted MSE 
cost in Eqn. (ITOl ) for reconstructing signals {rj}j:„^g7- at all receivers is written as 



Dmse,w = E 
= E 



iw 



BTl;pX||-y^ 

= tr (WS^W'^) - 2tr (WBTi;p5]^rW^) 

+ tr (WBTi;pS^T?;pB^W^) . (19) 
By construction of the weighting matrix W, the MSE in Eqn. ( [T9l ) is a weighted sum of individual distortions 
at receivers, i.e. Dmse,w = Y.t:v,eT^i ^i- 
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B. Computing Encoding Transforms Tj 

The optimization of the network transfer function T = Ti-p is more complex due to block constraints 
imposed by the network topology on matrices {Tj}^^^. In order to solve for a particular linear transform 
Tj, we assume all linear transforms Tj, j ^ i and the receivers' decoding transform B are fixed. Then 
the optimal T, is the solution to a constrained quadratic program. To derive this, we utilize the following 
identities in which x = vec(X): 

tr(A^X) = vec(A)^x, (20) 
tr(X^AiXA2) = x^(A2 Ai)x. (21) 

We write the network's linear transfer function as T = Ti-p = Ti:j_iTjTj_|_i:p and define the following 
matrices 

Ji ^ Ti+i;p5]^^W^WBTi:i_i, (22) 
J'i ^ (Ti:i_i)^B^W^WBTi;i_i, (23) 

Ji' — Tj+i:p5]a;(Tj+i:p)-^. (24) 

To write Dmse,w in terms of the matrix variable Tj, we also define the following, 

^tr(W5]rW^), (25) 

^-2vec(jf), (26) 

Pi ^ 3'1 J'i, (27) 

where pi, Pi, and Pj are a scalar, vector, and positive semi-definite matrix respectively. The following lemma 
expresses Dmse,w as a function of the unknown matrix variable Tj. 

Lemma 1: Let transforms Tj, j ^ i, and B be fixed. Let Jj, J'-, 3'- be defined in Eqns. (I22l)-(l24l). and 
Pi, Pi, and Pj be defined in Eqns. (|25])-(|27]). Then the weighted MSB distortion Dmse,w of Eqn. ( fT9l ) is a 
quadratic function of tj = vec(Ti), 

Dmse,w = tfPiti + pfti + Pi . (28) 

Proof: Substituting the expressions for Jj, 3'-, 3'- in Eqns. (I22l)-(l24l) into Eqn. ( fT9l) produces the inter- 
mediate equation: Dmse,w = tr(T?^J-TjJ^')— 2tr(jjTi)+pj. Directly applying the vector-matrix identities 
of Eqns. (|20ll-(l2l]) results in Eqn. (|28]l. ■ 
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(a) LTN Block Diagram 



(b) Distortion vs. Compression 



(c) Convergence 



Fig. 3. (a) Block diagram of the "hybrid network" example, {b) The end-to-end distortion vs. compression for varying bandwidth 
c = C13 = C23. The network operates in one of three modes (distributed, hybrid, or point-to-point) as described in Example |4] (c) 
Convergence of DMSE{n) for five different initializations of the iterative algorithm for the operating point c = Q, C34 = 11. 



C. Quadratic Program with Convex Constraints 

Due to Lemma [U the weighted MSB is a quadratic function of ti = vec(Tj) if all other network matrices are 
fixed. The optimal Tj must satisfy block constraints determined by network topology. The block constraints 
are linear equality constraints of the form = (j>i. For example, if Tj contains an identity sub-block, this 
is enforced by setting entries in ti to zero and one accordingly, via linear equality constraints. 

Theorem 1 (Optimal Encoding): Let encoding matrices Tj, j / i and decoding matrix B be fixed. Let 
ti = vec(Tj). The optimal encoding transform ti is given by the following constrained quadratic program 
(QP) Ell Def. 4.34] 

arg inin tfPiti + pfti + pi (29) 

S. t. ^iti=(t>i, 

where {^i,(pi) represent linear equality constraints on elements of Tj. The solution to the above optimization 
for ti is obtained by solving a corresponding linear system 



2Pi 


'I 




u 




-Pi 









A 




4>i 



(30) 



If the constraints determined by the pair {^i,(j)i) are feasible, the linear system of Eqn. (l30b is guaranteed 
to have either one or infinitely many solutions. 
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Algorithm 1 Ideal-Compression-Estimation(7V, W, e) 
1: Identify compression matrices {Tj}^^^ and corresponding linear equalities {^i,(l>i}^^i for network J\f. 

Identify estimation matrices {Bj}j:^^g7-. [Sec. Hill Sec. lIV-Cl 

2: Initialize {T^"''}^^^ randomly to feasible matrices. 

3: Set n=l, DMSE,wiO) = oo. 

4: repeat 

5: Compute {Bf^li.^.er given {T^""')}^^,. [Eqn. (d] 
6: for i = 1 : p do 

7: Compute tJ") given {B^^^}k:.,eT, {T^k^}tl^, {T^^-^^l^^^,^,. [Theoremll 

8: end for 

9: Compute DMSE,w{n). [Eqn. ( [T9l) 1 
10: Set Aa/5£;,w = DMSE,w{n) - DMSE,w{n - 1). 
11: Set n = n + 1. 
12: until AAfS£;,w < e Or n > Nmax- 

13: return {Tf)}^,, {B^ )},.„,,r. 



Proof: The QP of Eqn. (l29l) follows from Lemma [U with additional linear equality constraints placed on 
ti. The closed form solution to the QP is derived using Lagrange dual multipliers for the linear constraints, 
and the Karush-Kuhn-Tucker (KKT) conditions. Let /(tj,A) represent the Lagrangian formed with dual 
vector variable A for the constraints, 

/(ti, A) = tJViti +pfti +pi+X^ i^iti - (t>i) , (31) 

VtJ(ti,X) = 2Piti+pi + ^JX, (32) 

Vxf{ti,X) = ^^ti-<i>i. (33) 

Setting VtJ{ti,X) = and Vxf{ti,X) = yields the linear system of Eqn. (l30l ). the solutions to which are 
ti and dual vector A. Since the MSE distortion is bounded by a minimum of zero error, the linear system 
has a unique solution if Pj is full rank, or infinitely many solutions of equivalent objective value if Pj is 
singular. ■ 
Remark 3: Beyond linear constraints, several other convex constraints on matrix variables could be applied 
within the quadratic program. For example, the ^i-norm of a vector x G M" defined by ||a;||i = \xi\ is 
often used in compressed sensing to enforce sparsity. 
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TABLE I 


A "Hybrid" Linear Transform Network 


Network Modes 


Bandwidth 


Distributed 




Hybrid 


< C < C34 


Point to Point 


C34 < c 



D. An Iterative Algorithm 

Algorithm [T] defines an iterative method to optimize all encoding matrices {T^iY^^i ^'^^ ^^'^ global decoding 
matrix B for an LTN graph. The iterative algorithm begins with the random initialization of the encoding 
matrices {Tjj^^j^ subject to size specifications and linear equality constraints given by and {^i}f=i- 

The iterative method proceeds by solving for the optimal B transform first. Similarly, with Tj and B 

fixed, the optimal Tj is computed using Theorem [T] The iterative method proceeds for n < N^ax iterations 
or until the difference in error A ms w is less than a prescribed tolerance e. 

E. Convergence to Stationary Points 

A key property of Algorithm [T] is the convergence to a stationary point (either local minimum or saddle- 
point) of the weighted MSE. 

Theorem 2 (Local Convergence): Denote the network's linear transfer function after the n-th outer-loop 
iteration in Algorithm [T] by T*^"\ and the block-diagonal global decoding transform by B^") which contains 
matrices {B."^}j:^.g7- on the diagonal. Let r^") = B^^^T^^^x denote the estimate of desired signal r. Then 







2 








2 






> E 










W 








w 



i.e., the weighted MSE distortion is a nonincreasing function of the iteration number n. 

Proof: In Step |5] of Algorithm [T] with matrices {T^" ^^}fc=i fixed, the optimal transform B^") is deter- 
mined to minimize Dmse,w- The current transform B^"^^) is feasible within the optimization space which 
implies that the MSE distortion cannot increase. In Step|7]of the inner loop, with matrices B^"), {T^i^^}^^^l\ 
and {T^" ^^}fc=i+i fixed. Theorem [T] computes the optimal transform T^-"^ to minimize Dmse,w- A similar 
argument shows that the error term cannot increase. The distortion sequence {Dmse, win)} is nonincreasing 
and nonnegative; hence lim„_^oo ^a/se, w = inf{-DA/s£;,w("')} by monotone convergence. ■ 
Remark 4: The local convergence in Theorem |2] is affected by several factors: (i) The covariance structure 
of the source; (ii) The DAG structure of G; (iii) The schedule of iterative optimization of local matrices 
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and factorization of T into the T^; (iv) The random initialization of {Ti}^^^. In practice, multiple executions 
of Algorithm [T] increase the probability of converging to a global minimum. 

F. Example: A Multi-Hop Network 

Consider the noiseless multi-hop network of Fig.[3]in which a relay aggregates, compresses and/or forwards 
its observations to a receiver. The network is a hybrid combination of a distributed and point-to-point network. 

Example 4 ("Hybrid Network"): High-dimensional, correlated signals Xi G M"^ and X2 G M"^ are ob- 
served at nodes vi and V2 where ni = 77,2 = 15 dimensions. The covariance "Ex of the global source 
X = [xi; X2] was generated as follows for the experiment, ensuring "Sx >- 0- The diagonal entries of 
"Ex were selected as 15 + 2Uii, and off-diagonal entries for j > i were selected as 1 + 2Uij where Ua 
and Uij are i.i.d. uniform random variables over the interval [0, 1]. 

The linear transfer function is factored in the form T = T1T2 where Ti = L34 and 

[ Ll3 
L23 

The target reconstruction at V4 is the entire signal = x. The bandwidth C34 = 11, while bandwidth 
c = C13 = C23 is varied for the experiment. Depending on the amount of bandwidth c, the network operates 
in one of the modes given in Table H Fig. |3lb) plots the sum distortion vs. compression performance, and 
Fig- He) plots the convergence of Algorithm [T] for the operating point c = 6, C34 = 11. 

V. Noisy Networks 

We now analyze communication for networks with non-ideal channels: yij = Xij +Zij. Edges repre- 
sent vector Gaussian channels. Network communication is limited according to both bandwidth compression 
ratios Uij and signal-to-noise ratios SNRij. We simplify optimization of subspaces by restricting attention 
to single-layer multi-source, multi-receiver networks for which V = S U T. In this case, the linear transfer 
function is y = Tx + z, i.e. the noise is additive but not filtered over multiple network layers. 

A. MSE Distortion at Receivers 

Each receiver Vi ^ T receives observations = GiX + Zi where Zi is the noise to Vi. The MSE distortion 
for reconstructing at receiver Vi is given by, 

A = tr(S^)-2tr(BiGiS^rO+tr(B,S;,,Bf) 

+ tr(B,G,S^GfBf). (35) 



TO APPEAR IN IEEE TRANSACTIONS ON SIGNAL PROCESSING 



18 



Setting the matrix derivative with respect to Bj in Eqn. (1351 ) to zero yields the optimal linear transform B, 
(of. Eqn. dl)), 



B 



opt 



(36) 



Combining the LLSE estimates as f = By, where y = Tx + z, the weighted MSE for all receivers is given 
by 



Dmse,w = E 
= E 



I -l|2 

r 'llw 



r-B(Ta: + z)||;,j 
tr(WBTS;^T^B^W^) -2tr(WBTS;a;rW^) 
+ tr (WEr W^) +tr (WBS;,B^W^) . 



(37) 



By construction of the weighting matrix W, the MSE in Eqn. (|37] ) is a weighted sum of individual distortions 
at receivers, i.e. Dmse,w = A- 



B. Computing Encoding Transform T 

For noisy networks, power constraints on channel inputs limit the amount of amplification of transmitted 
signals. For single-layer networks, let G 5 be a source node with observed signal Xi. A power constraint 
on the input to channel G is given by 

E[\\xij\\l] = EiWUjXiWl] = tr{Uj^x,'Ljj)< Pij. (38) 

The power constraint in Eqn. (|38] ) is a quadratic function of the entries of the global linear transform T. More 
precisely, let iij = vec(Lij) and t = vec(T). Since t contains all variables of iij, we may write iij = J^t 
where Jij selects variables from t. Using the matrix-vector identities of Eqn. (|2T]) . the power constraint in 
Eqn. (|38] ) can be written as 



t'^jfj (S^. <S> I) Jijt. 



(39) 



Letting Tj 



3 (S. 



I) Jjj, the quadratic constraint is t Tijt < Pij. The matrix is a symmetric. 



positive semi-definite matrix. Thus a power constraint is a quadratic, convex constraint. 
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Algorithm 2 Noisy-Compression-Estimation(7V, W, e) 
1: Identify compression matrix T and corresponding linear equality constraints {^,<j)), and quadratic power 

constraints {(Fij, Identify estimation matrices {Bj}i:„.g7-. [Sec. |llll Sec. IV-BH 

2: Initialize T^^^ randomly to a feasible matrix. 

3: Set n=l, DMSE,wiO) = oo. 

4: repeat 

5: Compute {B^ ^j^^^^gr given T^'^-i). [Eqn. ^] 

6: Compute T(") given {Bf''>}i.,^^eT, (*,</>), {{'^'ij, Pij)}{i,j)e£- [TheoremH 

7: Compute DMSE,w{n). [Eqn. (l37l) 1 

8: Set Amse,-w = DMSE,-w{n) - DMSE,w{'n - 1). 

9: Set n = n + 1. 

10: until Amse,w < e orn > N.^ax- 

11: return and {Bj"^}i;^^e7- 



C. Quadratic Program with Convex Constraints 

As in Section ITV-Bl we use the vector form t = vec(T) to enforce linear equality constraints $t = <j>. For 
noisy networks, we include power constraints t^Vijt < Pij for each channel G £■ For a fixed global 
decoding transform B, the distortion Dmse,w of Eqn. (|37] ) is again a quadratic function of t. Using the 
compact notation 

p = ti{WErW'^)+ti{WBT,;,B^W^), (40) 
p = -2vec(B^W^WS;^^), (41) 
P = B^W'^WB, (42) 

a derivation identical to that of Lemma [T] yields Dmse,w = t^Pt + p^t + P- The optimal encoding 
transform T for single-layer noisy networks is solvable via a quadratic program with quadratic constraints 
(QCQP), following the development of Eqns. (|40b-(l42l). and the power constraints given in Eqns. (|38l)-(l39l); 
cf. Theorem [T] 

Theorem 3 ( Optimal Encoding T for Noisy LTN): Let M he a single-layer LTN, B be the fixed decoding 
transform, and t = vec(T) be the encoding transform. The optimal encoding t is the solution to the following 
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Zl4 I 




Fig. 4. A block diagram of a distributed, noise/power limited LTN. Each source node transmits signal projections of a vector 
Xi £ R* to a decoder over a vector arbitrary white Gaussian noise (AWGN) channel. 

quadratic program with quadratic constraints (QCQP): 

arg mill t^Pt + p^t + p (43) 
s. t. *i = (j), 

t^rijt<Pij, {i,j)e£, 

where ($,0) represent linear equality constraints (dictated by network topology), and {{^ij, Pij)}(ij)^£ 
represent quadratic power constraints on variables of T. 

Remark 5: A quadratic program with linear and convex quadratic constraints is solvable efficiently via 
standard convex program solvers; the time complexity depends polynomially on the number of matrix 
variables and constraints. 

D. Iterative Algorithm and Convergence 

Algorithm |2] defines an iterative algorithm for single-layer, noise/power limited networks. In addition to 
subspace selection, the amount of power per subspace is determined iteratively. The iterative method alternates 
between optimizing the global decoding transform B and the global encoding transform T, ensuring that 
network topology and power constraints are satisfied. As in Theorem [2l the weighted MSE distortion is a 
nonincreasing function of the iteration number, i.e. DMSE,w{n) > D]\fSE,w{n + 1). While convergence to 
a stationary point is guaranteed, the optimization space is highly complex- a globally optimal solution is not 
guaranteed. 
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(b) Cut-Set Lower Bounds (Information Theory) 

Fig. 5. (a) Power-compression-distortion "spectra" of the network for varying compression ratios a and SNR levels. The (red, 
urmiarked) dashed lines represent cut-set lower bounds on achievable MSB distortions for linear coding based on convex relaxations 
discussed in Section IVI-EI (b) For a £ {0.25, 1.0}, the performance of linear coding is compared with information-theoretic cut-set 
bounds (described in Section |VlTJ. In the high-SNR setting, information-theoretic coding strategies are capable of zero-distortion; 
however, in the low-SNR setting, linear coding achieves a competitive MSB performance while maintaining zero-delay and low- 
complexity. 

E. Example: A Distributed Noisy Network 

Fig. |4] diagrams a classic example of a distributed network with multiple source (sensor) nodes transmitting 
signal projections to a central decoder. Each source node is power constrained and must transmit a compressed 
description of its observed signal over a noisy vector channel. 
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Example 5 (Distributed LTN): In Fig. |4l the global source x = [xi; X2; 3:3] is chosen to be a jointly 
Gaussian vector with n = 12 dimensions, and nj = 4 for each of \S\ = 3 source nodes. Here, we specify 
the exact distribution of x in order to provide information-theoretic lower bounds. We set the covariance of 
X to be Gauss-Markov with p = 0.8, 



1 
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p' . 
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p ■ 
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The network structure is specified by bandwidths C14 = C24 = C34 = c. The global encoding transform T 
is block-diagonal with matrices L14, L24, and L34 on the diagonal. The compression ratio is varied equally 
for each source node, a = ^ where rij = 4. The noise variables Zij are i.i.d. Gaussian random vectors 
with zero-mean and identity covariances. The power constraints are set as Pi = P2 = P3 = c{SNR), where 
SNRij = SNR for all links. The goal of destination ^4 is to reconstruct the entire source = x. Fig. |5la) 
plots the performance of LTN optimization for varying a and SNR ratios as well as cut-set lower bounds 
for linear coding based on convex relaxations. Cut-set lower bounds for linear coding for this example are 
explained further in Section IVI-EI Fig. [5tb) plots cut-set bounds based on information theory which are 
explained further in Sections IVI-FI and IVI-GI 

Remark 6 (Comparison with l\5jj, ^): For this example, as the SNR — )• 00, the error Dmse approaches 
the error associated to the distributed KLT Q where channel noise was not considered. In [61, the authors 
model the effects of channel noise; however, they do not provide cut-set lower bounds. In addition, the 
iterative optimization of the present paper optimizes all compression matrices simultaneously per iteration 
and allows arbitrary convex constraints, as opposed to the schemes in both lIU, which optimize the 
encoding matrix of each user separately per iteration. 

VI. Cut-Set Lower Bounds 

In this section, we derive lower bounds on the minimum MSB distortion possible for linear compression 
and estimation of correlated signals in the LTN model. Our main technique is to relax an arbitrary acyclic 
graph along all possible graph cuts to point-to-point networks with side information. The cut-set bounds 
provide a performance benchmark for the iterative methods of Sections HVllVl 
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A. Point-to-Point Network with Side Information 

Consider the point-to-point network of Fig. |6l Source node vi compresses source x G M" via a linear 
transform L12. The signal X12 G K^^^^ is transmitted where X12 = ^i2X and £'[||a;i2||2] < P- Receiver V2 
computes a linear estimate of desired signal r G M*" using observations 2/12 = X12 + z and side information 
s G ffi'' as follows, 



B 



yi2 

s 



Bii B 



12 



yi2 

s 



(44) 



The decoding transform B is here partitioned into two sub-matrices Bn and B12. We will find it convenient 
to define the following random vectors, 



^ — X — 'Sxs^s 



V = r 



(45) 
(46) 



Signals ^ and v are innovation vectors. For example, ^ is the difference between x and the linear least squares 
estimate of X given s which is equivalent to Sj-^Sg s. 



B. Case I: Ideal Vector Channel 

In the ideal case, P = 00 or z = 0. The weighted, linear minimum MSB distortion of the point-to-point 
network with side information is obtained by solving 



D. 



ideal 



mm 
Li2,B 



mm 

Li2,Bii,Bi 



E 
E 



I -l|2 

r 'llw 



|2 

Iw 



(47) 



|r - (BiiLi2a; + Bi2s)|| 

The following theorem specifies the solution to Eqn. WT\ . 

Theorem 4 (Ideal Network Relaxation): Let s G M", s G M*, and r G M*^ be zero-mean random vectors 
with given full-rank co variance matrices S^, 5],. and cross-covariances 5]rx> ^rs^ ^xs- Let ^ and v be 
the innovations defined in Eqn (1451 ) and Eqn. (l46l ) respectively. The solution to the minimization of Eqn. (l47l) 

over matrices L12 G M'^i"^", Bn G M'"^'^l^ and B12 G W^"" is obtained in closed form as 

C12 

^rdea«=tr(S.W^W)-j;A„ (48) 

i=i 

where {\jY-^=i the C12 largest eigenvalues of the matrix W5]i,^5]^^I]^^,W^. 

Proof: The optimization in Eqn. ( |471 ) is simplified by first determining the LMMSE optimal B12 
transform in terms of Bn and L12: B'j'2* = Srs^^^ — BiiLi25]a;s Plugging into Eqn. (07]) 
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Fig. 6. A point-to-point network with side information s at the receiver. In the case of additive noise z, the input to the channel 
is power-constrained so that i5[||xi2|j2] 1^ P- 



yields a minimization over Bn and L12 only. By grouping and rearranging variables in terms of innovation 
vectors | and u, 

|2 



Dideal = min E 

-L'i25-t5ii 



\u - B11L12I 



w 



(49) 



The optimization of Eqn. ( |49l ) is that of an equivalent point-to-point network with input signal ^ and desired 
reconstruction u, without side information. Eqn. (l49l ) is in standard form and solvable using canonical 
correlation analysis as detailed in |[34l p. 368]. The optimal value D*^^^^ is given in Eqn. ( |48] ) in terms of 
the eigenvalues of WE^^S^ ^S^j,W^. ■ 

C. Case II: Additive Noise and Power Constraints 

In the case of additive noise z (here with assumed covariance S2 = I for compactness) and a power- 
constrained input to the vector channel, the weighted, linear minimum MSE distortion is obtained by solving 

|2 



Dnoisy = ^ mill E 



r-(Bii(Li2S + 2) + Bi2s) 



Iw 



s.t. tr[Li25]^Lf2] < P. (50) 

Again, by solving for the optimal LMMSE matrix B12 and grouping terms in the resulting optimization 
according to innovation vectors ^ and u, 

Dnoisy = min E - (Bii(Li2C + «))||w ' 

■L'i2i-t>ii L J 

s.t. tr[Li2S^Lf2] < P. (51) 

Remark 7: The exact solution to Eqn. (|5T]) involves handling a quadratic power constraint and a rank 
constraint due to the reduced-dimensionahty of L12. In |6, Theorem 4], a related optimization problem 
was solved via a Lagrangian relaxation. For our problem, we take a simpler approach using a semi-definite 
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programming (SDP) relaxation. We first note that D^^^y > D*^^^^. In tlie high-SNR regime, tlie two distortion 
values are asymptotically equivalent. Therefore, we compute a good approximation for the distortion D'^^^y 
in the low-SNR regime via the following SDP relaxation. 

Theorem 5 (SDP Relaxation): Consider random vectors x, s, r, ^, u, and matrices L12, Bn as defined in 
Theorem m In addition, let random vector z have zero-mean and covariance Xl^ = I. Let ^ = L^2Li2 and 
$ G W^ '" be an arbitrary positive semi-definite matrix where r is the dimension of random vector r. The 
following lower bound applies. 



D*^,,,y > min tr[*] +tr[W[S, - S.^E^-^E^,] W 
s.t. trpa;*] <P, * ^ 0, 



h 0. (52) 



The proof of Theorem [5] is based on a rank relaxation as detailed in the Appendix. The power constraint 
is still enforced in Eqn. ( [52l ). In the low-SNR regime, power allocation over subspaces dominates the error 
performance. If we denote the solution to the SDP of Theorem [5] as D*^^, we arrive at the following 
characterization, 

D noisy ^ ""^^^{^ideah ^sdp}- (53) 

D. Cut-Set Lower Bounds for Linear Coding 

Consider an LTN graph M with source nodes S C V and receivers T C V. We assume that 5 n T = 0, 
i.e. the set of sources and receivers are disjoint. The total bandwidth and total power across a cut T C V 
are defined respectively as 

jke£ 

P{F)= Pjk^ (55) 

jk&S 
is J", k&F" 

where the edge set £ and bandwidths Cj^ were defined in Section |II1 The edges of the graph are directed, 
hence the bandwidth across a cut accounts for the Cij only for those edges directed from node Vi to Vj. In 
the following theorem, the notation Xj- denotes the concatenation of vectors Xi : Vi e T. The set denotes 
the complement of T in V. 
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Definition 10: D*^^^^ [x,r\s; c,W~\ represents the distortion D*^^^j^ computed with the weighted norm 
via W for the ideal point-to-point network with input x, bandwidth c, reconstruction vector r, and side 
information to receiver s. Similarly, D^^-^y [x,r|s; c, P, W] represents the distortion D^^-^y for a noisy 
point-to-point network with channel-input power constraint P and noise vector z with zero-mean with identity 
covariance. 

Theorem 6 (Cut-Set Lower Bounds): Let M be an arbitrary LTN graph with source nodes S and receivers 
T. Let C V be a cut of the graph. For ideal channel communication, 



E 



Iw 



> Dl 



ideal 



x^.;C(^),W 



(56) 



In the case of noisy channel communication over network J\f with additive channel noise Zij (assumed 
zero-mean, identity covariance). 



E 



> D* 

— ^noisy 



X^c;C(^),P(J-),W 



(57) 



Proof: The LTN graph is partitioned into two sets F and F'^. The source nodes vi ^ F are merged as one 
source "super" node, and the receivers Vi S F"^ are merged into one receiver "super" node. The maximum 
bandwidth and maximum power between the source and receiver are C{F) and P{F) respectively. The 
random vector Xj^c represents those signals with channels to the receiver super node, not accounted for in 
the cut F; hence, this information is given as side information (a relaxation) to the receiver. The relaxed 
network after the merging process is the point-to-point network of Fig. |6] with noise z of dimension equal 



Iw 



to the bandwidth C{F) of the cut, and provides a lower bound on the MSE distortion E ||rjrc — fjrc 
at receivers Vi ^ F'^. ■ 
Remark 8: The total number of distinct cuts F separating sources and receivers is (2l'^l — l)(2l^l — 1). 
For a particular cut, there exists a continuum of lower bounds for multi-receiver networks depending on the 
choice of weighting W. 



E. Example: Cut-Set Lower Bounds for Linear Coding 

In Fig. Oa), cut-set lower bounds for linear coding are illustrated based on Theorem [6] for a distributed 
noisy network. The bounds aie depicted for the cut that separates all sources from the receiver. Due to our 
approximation method in Eqn. ( [53] ) based on the SDP relaxation, the lower bounds show tight agreement in 
the low-SNR and high-SNR asymptotic regimes. 
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Fig. 7. (a) Block diagram of a multi-source, multi-destination ideal network. Bandwidths Cij of all links are labeled. Although the 
graph is symmetric, the source covariance matrix given in Eqn. l l63t includes cross-correlations which cause the distortion plots to 
appear asymmetric, (b) The distortion region assuming that node reconstructs Xi, and node reconstructs X2- The cut-set lower 
bounds are drawn as dotted lines, and the shaded region depicts the achievable points, (c) The distortion region assuming that node 
Us reconstructs X2, and node i^e reconstructs X\. 



F. Cut-Set Lower Bound From Information Theory 

For the point-to-point communication scenario illustrated in Fig. |6l the information-theoretically optimal 
performance can be determined precisely. Consider an ^-length sequence of jointly i.i.d. 

random vectors. The source node vi has access to the source sequence We will assume throughout 

that r (respectively r[t\) is a deterministic function of (x,s) (respectively s[t])). The goal of receiver 



\T!t=im-m\l 



where the reconstruction 



V2 is to minimize the average MSB distortion Di = E 

sequence {r[t]}^^]^ is generated based on access to side information {s[f]}j^]^ and the sequence of channel 
output vectors. We study the performance in the limit as ^ — oo and denote D = Dqo- 

1 ) Source-Channel Separation: We establish a lower bound by combining the data processing inequality 
with the definitions of Wyner-Ziv rate-distortion function and channel capacity. Specifically, by straight- 
forward extension of |[35l . the minimum rate R{D) required to reconstruct at distortion D is 
given by R{D) = min/(x;u|s) where the minimization is over all "auxiliary" random vectors u for which 
p{u,x,s) = p{u\x)p{x,s) and for which £'[||r — £^[r|«,s]||2] < D. Furthermore, by definition of the channel 
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capacity C{P) between vi and V2, C{P) = v[iayip(^x^^yE[\\xx2\\l]<P ^i^^'^'^V'^'i^ Source-channel separation 
applies to the scenario of Fig. [6l and in a nearly identical proof as detailed in |[36l Thm. 1.10], 

R{D) < C(P). (58) 

2) R{D) for Jointly Gaussian Sources: If {(r[t], form an i.i.d. sequence of jointly Gaussian 
random vectors, then R{D) is equal to the conditional rate-distortion function ||5J Appendix II], 

Rc{D)= mill /(x;r|s). (59) 

p{f\x,s):E[\\r—f\\?2]<D 

3 ) Capacity of the Vector AWGN Channel: If the channel noise z is a Gaussian random vector with zero 
mean and covariance Slz = I, the capacity of the channel in Fig. [6] with bandwidth ci2 and power constraint 
P is 

P 



C{P) = ^ log2 



1 + 

Cl2 



(60) 



4 ) Cut-set Bound: We utilize Eqn. (1581) to obtain an information-theoretic lower bound to the distortion 
achievable in any network of the type considered in this paper. An arbitrary graph is reduced via graph cuts 
to point-to-point networks. The following theorem collects the known information-theoretic results discussed. 

Theorem 7 ( Cut-Set Bounds: Info. Theory ): Let J\f be an arbitrary LTN graph with vector AWGN chan- 
nels. Consider a cut T C V separating the graph into a point-to-point network with bandwidth C{T) and 
power P{F). Let R{D*^^) be the rate-distortion function for the source xj- with side information xj^c and 
reconstruction r jrc Q Then 



C{T) 



(61) 



G. Example: Cut-Set Lower Bound From Information Theory 

For the noisy network in Example[5l consider cut F = {vi,V2,v^}. The source signal Xjr = x = \x\ ;X2;X:{] 
is jointly Gaussian, the side information is absent, and rjrc = x. Denote the eigenvalues of the source xjr 
as {\x,i\^=i- Evaluating Eqn. (|59l ) as in [5, Appendix II], optimal source coding corresponds to reverse 
water-filling over the eigenvalues (see also 1371 Chap. 10]), 



'The notation in information theory vs. signal processing differs. The term I{xi2\yi2) denotes the mutual information between 
random vectors whereas the term p{x\2) indicates a probability distribution. 
"'We assume that rjrc is a deterministic function of the global source x. 
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Rc{D*pt) = max j i log2 o| , 
i=i ^ * J 



if 61 < A. 



where D, 



I ^x,i if ^ > Aa; j 

and where 9 is chosen such that X^ILi ^* ~ ^opt- The lower bound of Eqn. (1611 ) is plotted in Fig. |5jb) for 
two different bandwidth compression ratios. 



H. Example: Multi-Source, Multi-Receiver Network 

Example 6 (Multiple Unicast): In Fig. |7J the global source x = [xi; X2] where Xi G and X2 G M^. 
The correlation structure of x is given by the following matrices, 





5^12 


^21 


^22 



2.4 1.1 0.4 0.0 

1.1 1.7 0.8 0.4 

0.4 0.8 1.2 0.0 

0.0 0.4 0.0 0.8 
O.T0VOOV20V3 

0.1 0.2 0.6 0.0 

0.0 0.2 0.1 0.1 

0.1 0.1 0.3 0.0 



0.1 0.1 0.0 0.1 

0.0 0.2 0.2 0.1 

0.2 0.6 0.1 0.3 

0.3 0.0 0.1 0.0 
lVr"0.l"0.2"0.0 

0.1 1.2 0.2 0.1 

0.2 0.2 1.0 0.6 

0.0 0.1 0.6 1.2 



(63) 



The network structure is specified by bandwidths Cij as labeled in Fig. Ha). The factorization of the global 
linear transform T was given in Example |2] of Section |IVl 

The distortion region for the network in the case when node ^5 estimates = Xi, and node vq estimates 
rg = X2 is given in Fig. Hb). A direct link exists from each source to receiver. However, if the desired 
reconstruction at the receivers is switched as in Fig. He), the channel from ^3 to V4 must be shared fully and 
becomes a bottleneck. The cut-set bounds of interest are shown in dotted lines. The shaded region depicts 
the points achievable via the iterative method of Section |IVl In Fig. He), the upper and lower bounds are 
not tight everywhere-even if one receiver is completely ignored, the resulting problem is still a distributed 
compression problem for which tight bounds are not known. The achievable curve was generated by taking 
the convex hull of 32 points corresponding to weighting ratios ^ G ilM^ 100]. 

In Table Ull we compare the results of linear transform design methods for the minimum sum distortion 
point (weighting ratio ^ = 1). 

• Random Projections- Each entry for all compression matrices is selected from the standard normal 
distribution. The sum distortion + Dq is averaged over 10^ random compression matrices selected 
for all nodes. 

• Routing and Network Coding (Ad-Hoc)- For the scenario in Fig. Hb), nodes vi and V2 project their 
signal onto the principal eigenvectors of Xln and S22 respectively. Routing permits each receiver to 



TO APPEAR IN IEEE TRANSACTIONS ON SIGNAL PROCESSING 



30 



TABLE II 

Comparison of Reduced-Dimension Linear Transforms 

Fig. 5(6) Fig. 5(c) 

Design Method D5 + De -D5 + -De 

Random Projections 4.3170 6.3471 

Routing and Network Coding 2.7029 3.8170 

Iterative QP Optimization 2.3258 2.6165 

{Lower Bound) 2.3243 2.3243 



receive the best two eigenvector projections from its corresponding source, as well as an extra projection 
from the other source. For Fig. He), using a simple "network coding" strategy of adding signals at v^, 
one receiver is able to receive its best two eigenvector projections, but the other receiver can only receive 
one best eigenvector projection. 

• Iterative QP Optimization- Linear transforms are designed using the iterative method of Section |IVl 

• Lower Bound- The minimum sum distortion possible due to the cut-set lower bound of Theorem [6] 

VII. Conclusion 

The linear transform network (LTN) was proposed to model the aggregation, compression, and estimation 
of correlated random signals in directed, acyclic graphs. For both noiseless and noisy LTN graphs, a new 
iterative algorithm was introduced for the joint optimization of reduced-dimension network matrices. Cut-set 
lower bounds were introduced for zero-delay linear coding based on convex relaxations. Cut-set lower bounds 
for optimal coding were introduced based on information-theoretic principles. The compression-estimation 
tradeoffs were analyzed for several example networks. A future challenge remains to compute tighter lower 
bounds and relaxations for non-convex network optimization problems. Reduced-dimension linear transforms 
have potential applications in data fusion and sensor networks. The idea of exploiting correlations between 
network signals to reduce data transmission, and the idea of approximate reconstruction as opposed to exact 
recovery at receivers may lead to further advances in networking. 
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Appendix 

Starting from the optimization in Eqn. dSTT ). the LLSE optimal matrix B°^* = S|^^L^2(Li2^^L^2 + 
assuming Sz = I. Substituting this expression and simpUfying the objective function in Eqn. dSTI ). 

= min tr[WS,W^] 



+ tr 



s.t. tr [LiaS^Lfs] < P. 



(64) 

Applying the Woodbury (matrix-inversion) identity Il33l C.4.3] to the objective function and simpUfying 
terms, 



D:,,,^ = min tr[WS,W^]-tr 

Lii~ 



+ tr 



-1 



-1 



(65) 



s.t. tr [LuEa,L^^] < P. 

Introducing a positive semi-definite matrix $ such that $ >z WXlj^^S^^ [S^^ + Lf2Li2] ^E^^E^i,W^ , 
written equivalently in Schur-complement form ll33l A.5.5], and setting ^ = lr[2^i2 £ M"^" as a rank ci2 
matrix, 

D*noisy = min tr [$] + tr [w[5]^ - S.^S^-^I]^,] W^] , 
s.t. tr [Sa;*] <P, * ^ 0, rank [*] = ci2, 



i-In^ ^utT N^-l 



Dropping the rank constraint yields the relaxation of Eqn. (l52l) . 



^ 0. 



(66) 
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